Implement the parser
We will start with a small language that has not many features. A programmer can specify one of the following:
- print a string literal, or
- evaluate an arithmetic expression
1 Grammar
Here is the grammar for the initial version of our language:
\[ \definecolor{myblue}{RGB}{37, 104, 145} \newcommand{gt}[1]{{\color{myblue}\underline{#1}}\quad} \newcommand{gnt}[1]{\mathit{#1}\quad} \newcommand{\Program}{\gnt{Program}} \newcommand{\STR}{\gt{STR}} \newcommand{\Expr}{\gnt{Expr}} \newcommand{\INT}{\gt{INT}} \newcommand{\prod}{\rightarrow\quad} \newcommand{\alt}{\ |\ \quad} \]
\[ \begin{align} \Program \prod & \gt{print} \STR \\ \alt & \Expr \\ \\ \Expr \prod & \Expr \gt{+} \Expr \\ \alt & \Expr \gt{-} \Expr \\ \alt & \Expr \gt{*} \Expr \\ \alt & \gt{(} \Expr \gt{)} \\ \alt & \INT \end{align} \]
A few notes about the grammar:
- \(\STR\) is a string literal, and \(\INT\) is an integer literal.
- Whitespace is insignificant in our concrete syntax.
- The grammar is ambiguous, but we intend for the usual associativity and precedence to apply for mathematical expressions.
2 Implement the parser
We will continue to use Alex to generate the lexer and Happy to generate the parser. Create the corresponding files and add them to your repo. You will also need to define data structures for the tokens and for the AST.
You can name and organize your files as you see fit!
Consider implementing some tests for the lexer and parser, before implementing the lexer and parser themselves!
You can use an implementation from a previous assignment / lab for inspiration!
String literals in a programming language can get quite complicated. I recommend we keep it simple for now: A string is anything that appears between two quotation marks.
Perhaps the simplest implementation is:
\" .* \" { TokenString . read }
where TokenString is the name of the constructor for your token. Haskell’s read function will “unescape” any escape characters that appear in the string.
This implementation probably isn’t sophisticated enough for a production programming language, but it’s quick and easy and will do for version 1.0 of our compiler!
If you are interested in more sophisticated string parsing, here are a couple of articles that discuss more advanced features of Alex:
Once you have implemented the lexer and parser, the code should build and not mention parser conflicts during the build process.
3 Add phases
Add phases to the compiler for lexing and parsing.
You can use logging or other means (for example, just printing the token list or AST) to make sure that your lexing and parsing phases are working as you intend.